Search | WHO COVID-19 Research Database

An Open-Domain QA System for e-Governance

Ion, R.; Avram, A. M.; Păiș, V.; Mitrofan, M.; Mititelu, V. B.; Irimia, E.; Badea, V..

5th International Conference on Computational Linguistics in Bulgaria, CLIB 2022 ; : 105-112, 2022.

Article in English | Scopus | ID: covidwho-2168598

ABSTRACT

The paper presents an open-domain Question Answering system for Romanian, answering COVID-19 related questions. The QA system pipeline involves automatic question processing, automatic query generation, web searching for the top 10 most relevant documents and answer extraction using a fine-tuned BERT model for Extractive QA, trained on a COVID-19 data set that we have manually created. The paper will present the QA system and its integration with the Romanian language technologies portal RELATE, the COVID-19 data set and different evaluations of the QA performance. © 2022, Institute for Bulgarian Language. All rights reserved.

IMPROVED TEXT NORMALIZATION AND LANGUAGE MODELS FOR SPEED'S AUTOMATIC SPEECH RECOGNITION SYSTEM

Manolache, C.; Georgescu, A. L.; Cucu, H.; Mititelu, V. B.; Burileanu, C..

Proceedings of the 15th International Conference Linguistic Resources and Tools for Natural Language Processing ; : 115-128, 2020.

Article in English | Web of Science | ID: covidwho-1285867

ABSTRACT

Automatic speech recognition (ASR) systems that use word-based language models require periodical updates to include new named entities (e.g. coronavirus, COVID-19) or collocations. Moreover, in particular for the Romanian language, the new hyphenated words pose additional problems. In this context, our study presents SpeeD's efforts in collecting new text corpora and using them for language modelling in the context of ASR. We also present the improvements made in the text normalization module to address the problems posed by hyphenated words. We evaluate the resulting language models both in terms of their ability to predict future words (perplexity and out-of-vocabulary rate) and in terms of their usefulness in ASR (word error rate). We report ASR relative improvements of around 10% for spontaneous speech, with small degradations for read speech.

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL